Predicting Images using Convolutional Networks: Visual Scene Understanding with Pixel Maps
نویسندگان
چکیده
In the greater part of this thesis, we develop a set of convolutional networks that infer predictions at each pixel of an input image. This is a common problem that arises in many computer vision applications: For example, predicting a semantic label at each pixel describes not only the image content, but also fine-grained locations and segmentations; at the same time, finding depth or surface normals provide 3D geometric relations between points. The second part of this thesis investigates convolutional models also in the contexts of classification and unsupervised learning. To address our main objective, we develop a versatile Multi-Scale Convolutional Network that can be applied to diverse vision problems using simple adaptations, and apply it to predict depth at each pixel, surface normals and semantic labels. Our model uses a series of convolutional network stacks applied at progressively finer scales. The first uses the entire image field of view to predict a spatially coarse set of feature maps based on global relations; subsequent scales correct and refine the output, yielding a high resolution prediction. We look exclusively at depth prediction first, then generalize our method to multiple tasks. Our system achieves state-of-the-art results on all tasks we investigate, and can match many image details without the need for superpixelation. Leading to our multi-scale network, we also design a purely local convolutional network to remove dirt and raindrops present on a window surface, which learns to identify and inpaint compact corruptions. We also we investigate a weighted nearest-neighbors labeling system applied to superpixels, in which we learn weights for each example, and use local context to find rare class instances. In addition, we investigate the relative importance of sizing parameters using a recursive convolutional network, finding that network depth is most critical. We also develop a Convolutional LISTA Autoencoder, which learns features similar to stacked sparse coding at a fraction of the cost, combine it with a local entropy objective, and describe a convolutional adaptation of ZCA whitening.
منابع مشابه
An efficient method for cloud detection based on the feature-level fusion of Landsat-8 OLI spectral bands in deep convolutional neural network
Cloud segmentation is a critical pre-processing step for any multi-spectral satellite image application. In particular, disaster-related applications e.g., flood monitoring or rapid damage mapping, which are highly time and data-critical, require methods that produce accurate cloud masks in a short time while being able to adapt to large variations in the target domain (induced by atmospheric c...
متن کاملDeep Context Convolutional Neural Networks for Semantic Segmentation
Recent years have witnessed the great progress for semantic segmentation using deep convolutional neural networks (DCNNs). This paper presents a novel fully convolutional network for semantic segmentation using multi-scale contextual convolutional features. Since objects in natural images tend to be with various scales and aspect ratios, capturing the rich contextual information is very critica...
متن کاملBeyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation
Despite the tremendous achievements of deep convolutional neural networks (CNNs) in most of computer vision tasks, understanding how they actually work remains a significant challenge. In this paper, we propose a novel two-step visualization method that aims to shed light on how deep CNNs recognize images and the objects therein. We start out with a layer-wise relevance propagation (LRP) step w...
متن کاملScene Parsing by Weakly Supervised Learning with Image Descriptions
This paper investigates a fundamental problem of scene understanding: how to parse a scene image into a structured configuration (i.e., a semantic object hierarchy with object interaction relations). We propose a deep architecture consisting of two networks: i) a convolutional neural network (CNN) extracting the image representation for pixel-wise object labeling and ii) a recursive neural netw...
متن کاملEfficient Convolutional Patch Networks for Scene Understanding
In this paper, we present convolutional patch networks, which are convolutional (neural) networks (CNN) learned to distinguish different image patches and which can be used for pixel-wise labeling. We show how to easily learn spatial priors for certain categories jointly with their appearance. Experiments for urban scene understanding demonstrate state-of-the-art results on the LabelMeFacade da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015